Qwen3 VL 235B A22B Thinking

About the Provider

Alibaba Cloud is the cloud computing arm of Alibaba Group and the creator of the Qwen model family. Through its open-source initiative, Alibaba has released state-of-the-art language and multimodal models under permissive licenses, enabling developers and enterprises to build powerful AI applications across diverse domains and languages.

Model Quickstart

This section helps you quickly get started with the Qwen/Qwen3-VL-235B-A22B-Thinking model on the Qubrid AI inferencing platform. To use this model, you need:

A valid Qubrid API key
Access to the Qubrid inference API
Basic knowledge of making API requests in your preferred language

Once authenticated with your API key, you can send inference requests to the Qwen/Qwen3-VL-235B-A22B-Thinking model and receive responses based on your input prompts. Below are example placeholders showing how the model can be accessed using different programming environments.
You can choose the one that best fits your workflow.

from openai import OpenAI

# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
    base_url="https://platform.qubrid.com/v1",
    api_key="QUBRID_API_KEY",
)

# Create a streaming chat completion
stream = client.chat.completions.create(
    model="Qwen/Qwen3-VL-235B-A22B-Thinking",
    messages=[
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is in this image? Describe the main elements."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
            }
          }
        ]
      }
    ],
    max_tokens=4096,
    temperature=0.7,
    top_p=0.9,
    stream=True
)

# If stream = False comment this out
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")

# If stream = True comment this out
print(stream.choices[0].message.content)

Model Overview

Qwen3-VL-235B-A22B-Thinking is the most powerful vision-language model in the Qwen series.

With 235B total parameters and 22B active per token, it excels in multimodal STEM and math reasoning, visual agent tasks, GUI automation, spatial perception, long video comprehension, and multilingual OCR across 32 languages.
Its thinking mode enables deep chain-of-thought reasoning over complex visual inputs, with a 256K native context window expandable to 1M tokens.

Model at a Glance

Feature	Details
Model ID	`Qwen/Qwen3-VL-235B-A22B-Thinking`
Provider	Alibaba Cloud (Qwen Team)
Architecture	Sparse MoE Transformer with DeepStack multi-level ViT feature fusion and Interleaved-MRoPE for video temporal reasoning
Model Size	235B Total / 22B Active
Context Length	256K Tokens (up to 1M)
Release Date	2025
License	Apache 2.0
Training Data	Large-scale multimodal dataset across 32 languages; RL post-training with thinking mode for deep reasoning

When to use?

You should consider using Qwen3-VL-235B-A22B-Thinking if:

You need visual STEM and math reasoning with deep chain-of-thought
Your application requires GUI automation or visual agent tasks
Your use case involves multimodal coding from images or video
You need long video understanding and temporal reasoning
Your workflow requires multilingual OCR across 32 languages
You need 3D grounding and spatial reasoning over visual inputs

Inference Parameters

Parameter Name	Type	Default	Description
Streaming	boolean	true	Enable streaming responses for real-time output.
Temperature	number	0.7	Controls randomness in output.
Max Tokens	number	4096	Maximum tokens to generate.
Top P	number	0.9	Controls nucleus sampling.

Key Features

Thinking Mode: Built-in chain-of-thought reasoning for deep multimodal problem solving across STEM, math, and visual tasks.
DeepStack Multi-Level ViT Fusion: Multi-level visual feature fusion for fine-grained image and document understanding.
Interleaved-MRoPE: Advanced positional encoding for precise video temporal reasoning across long sequences.
256K Native Context: Supports up to 1M tokens — enabling long video comprehension and large document analysis.
Rivals Gemini 2.5 Pro: Competitive on perception and multimodal reasoning benchmarks at open-weight scale.
Multilingual OCR: Accurate text recognition across 32 languages in images and documents.
Apache 2.0 License: Fully open source with full commercial freedom.

Summary

Qwen3-VL-235B-A22B-Thinking is the flagship vision-language model of the Qwen series, built for deep multimodal reasoning.

It uses a Sparse MoE Transformer with DeepStack ViT fusion and Interleaved-MRoPE, with 235B total and 22B active parameters per token.
It rivals Gemini 2.5 Pro on perception benchmarks and leads in GUI automation, visual STEM reasoning, and multilingual OCR.
The model supports 256K native context (up to 1M), thinking mode for chain-of-thought reasoning, and 32 languages.
Licensed under Apache 2.0 for full commercial use.

Getting started

GPU Compute

Inferencing

Qubrid AI Models

AI Tools

About the Provider

Model Quickstart

Model Overview

Model at a Glance

When to use?

Inference Parameters

Key Features

Summary

Getting started

GPU Compute

Inferencing

Qubrid AI Models

AI Tools

Documentation Index

​About the Provider

​Model Quickstart

​Model Overview

​Model at a Glance

​When to use?

​Inference Parameters

​Key Features

​Summary

About the Provider

Model Quickstart

Model Overview

Model at a Glance

When to use?

Inference Parameters

Key Features

Summary